Efficient Development of Parallel NLP Applications

نویسندگان

Prateek Jindal

Dan Roth

L. V. Kale

چکیده

Parallel programming is becoming increasingly popular. Computers have increasingly many cores (processors). Also, large computer-clusters are becoming available. But there is still no good programming framework for these architectures, and thus no simple and unified way for NLP applications to take advantage of the potential speed up. In this paper, we develop a broadly applicable parallel programming method to NLP problems. Our work is in distinct contrast to the tradition of designing (often ingenious) ways to speed up a single algorithm at a time. Specifically, we show how the problems which can be expressed in LBJ framework [13] take advantage of parallelization. We use Charm++ [7] to demonstrate the speed up of NLP applications.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

TectoMT: Modular NLP Framework

In the present paper we describe TectoMT, a multi-purpose open-source NLP framework. It allows for fast and efficient development of NLP applications by exploiting a wide range of software modules already integrated in TectoMT, such as tools for sentence segmentation, tokenization, morphological analysis, POS tagging, shallow and deep syntax parsing, named entity recognition, anaphora resolutio...

متن کامل

Creating Sentence-Aligned Parallel Text Corpora from a Large Archive of Potential Parallel Text using BITS and Champollion

Parallel text is one of the most valuable resources for development of statistical machine translation systems and other NLP applications. The Linguistic Data Consortium (LDC) has supported research on statistical machine translations and other NLP applications by creating and distributing a large amount of parallel text resources for the research communities. However, manual translations are v...

متن کامل

A New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model

Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...

متن کامل

Interlingual Annotation of Parallel Text Corpora: A New Framework for Annotation and Evaluation

This paper focuses on the next step in the creation of a system of meaning representation and the development of semantically-annotated parallel corpora, for use in applications such as machine translation, question answering, text summarization, and information retrieval. The work described below constitutes the first effort of any kind to provide parallel corpora annotated with detailed deep ...

متن کامل

An Efficient Parallel Substrate tor Typed Feature Structures on Shared Memory Parallel Machines

This paper describes an efficient parallel system for processing Typed Feature Structures (TFSs) on shared-memory parallel machines. We call the system Parallel Substrate for TFS (PSTFS}. PSTFS is designed for parallel computing environments where a large number of agents are working and communicating with each other. Such agents use PSTFS as their low-level module for solving constraints on TF...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

Efficient Development of Parallel NLP Applications

نویسندگان

چکیده

منابع مشابه

TectoMT: Modular NLP Framework

Creating Sentence-Aligned Parallel Text Corpora from a Large Archive of Potential Parallel Text using BITS and Champollion

A New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model

Interlingual Annotation of Parallel Text Corpora: A New Framework for Annotation and Evaluation

An Efficient Parallel Substrate tor Typed Feature Structures on Shared Memory Parallel Machines

عنوان ژورنال:

اشتراک گذاری